High Performance Transaction Processing on Non-Uniform Hardware Topologies
نویسنده
چکیده
Transaction processing is a mission critical enterprise application that runs on high-end servers. Traditionally, transaction processing systems have been designed for uniform coreto-core communication latencies. In the past decade, with the emergence of multisocket multicores, for the first timewe have Islands, i.e., groups of cores that communicate fast among themselves and slower with other groups. In current mainstream servers, each multicore processor corresponds to an Island. As the number of cores on a chip increases, however, we expect that multiple Islands will form within a single processor in the nearby future. In addition, the access latencies to the local memory and to the memory of another server over fast interconnect are converging, thus creating a hierarchy of Islands within a group of servers. .... Non-uniform hardware topologies pose a significant challenge to the scalability and the predictability of performance of transaction processing systems. Distributed transaction processing systems can alleviate this problem; however, no single deployment configuration is optimal for all workloads and hardware topologies. In order to fully utilize the available processing power, a transaction processing system needs to adapt to the underlying hardware topology and tune its configuration to the current workload. More specifically, the system should be able to detect any changes to the workload and hardware topology, and adapt accordingly without disrupting the processing. .... In this thesis, we first systematically quantify the impact of hardware Islands on deployment configurations of distributed transaction processing systems. We show that none of these configurations is optimal for all workloads, and the choice of the optimal configuration depends on the combination of the workload and hardware topology. In the cluster setting, on the other hand, the choice of optimal configuration additionally depends on the properties of the communication channel between the servers. We address this challenge by designing a dynamic shared-everything system that adapts its data structures automatically to hardware Islands. To ensure good performance in the presence of shifting workload patterns, we use a lightweight partitioning and placement mechanism to balance the load andminimize the synchronization overheads across Islands. .... Overall, we show that masking the non-uniformity of inter-core communication is critical for achieving predictably high performance for latency-sensitive applications, such as trans-
منابع مشابه
Smoothing non-uniform communication latencies for OLTP
Transaction processing applications traditionally run on the high-end servers. Up until recently, such servers had uniform core-to-core communication latencies. Now with multisocket multicores, for the first time we have Islands, i.e., groups of cores that communicate very fast with cores that belong to the same group and several times slower with cores from other groups. In current mainstream ...
متن کاملReducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisitions at writes to shared data. This can have a significant impact on performance in a cache-coherent non-uniform memory architecture (NUMA) multiprocessor. By combining a read-request and an ownership acquisition, the w...
متن کاملOLTP on Hardware Islands
Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have nonuniform access latencies to the main memory and to the processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear i...
متن کاملPerformance analysis on a CC - NUMA
Cache-coherent nonuniform memory access (CC-NUMA) machines have been shown to be a promising paradigm for exploiting distributed execution. CC-NUMA systems can provide performance typically associated with parallel machines, without the high cost associated with parallel programming. This is because a single image of memory is provided on a CCNUMA machine. Past research on CC-NUMA machines has ...
متن کاملThunderGeckoMonkey: An Energy-Aware High Performance Secure Computing System
This paper presents ThunderGeckoMonkey, a high-performance, energy-efficient processor aimed at On-Line Transaction Processing (OLTP) workloads. ThunderGeckoMonkey also offers enhanced, transparent fault tolerance and hardware support for a wide range of common encryption and hashing algorithms. A chip multiprocessor, ThunderGeckoMonkey is based on 4issue out-of-order cores, each utilizing adva...
متن کامل